An Audio-Visual Imposture Scenario by Talking Face Animation

نویسندگان

Walid Karam

Chafic Mokbel

Hanna Greige

Guido Aversano

Catherine Pelachaud

Gérard Chollet

چکیده

With the start of the appearance of PDA’s, handheld PC’s, and mobile telephones that use biometric recognition for user authentication, there is higher demand for automatic non-intrusive voice and face speaker verification systems. Such systems can be embedded in mobile devices to allow biometrically recognized users to sign and send data electronically, and to give their telephone conversation a legal value. The European project “Secure Contracts Signed by Mobile Phone” (SecurePhone) aims at developing such technology on a 3G/B3G enabled PDA. One of the risks that a speaker verification system could face is its liability to imposture. With the current communication infrastructure lacking strong user identification, impostors aware of legal transactions can interfere and be engaged in a telephone conversation so as to alter or replace the true conversation, or even initiate a conversation and impersonate in it another person. To combat imposture, it is necessary to study imposture techniques and scenarios. In this paper, we implement a system that allows an impostor to start and lead an audio-visual telephone conversation, and sign and exchange data electronically on behalf of another person. During the conversation, audio and video of the impostor are altered in a way as to mimic the other person’s voice and face. On the speech side, there exist processing techniques exploitable by impostors to reproduce the voice of an authorized client. In particular, speech segments obtained from client's recordings can be used to synthesize new sentences that the client never pronounced. We will explain how a very-low bit-rate speech coding system, such as the ALISP-based one, can be adapted to serve forgery purposes, transforming any input speech into client's voice. On the human face side, the imposter’s talking face is detected and facial features are extracted and tracked. Lip movements are used to animate a synthetic talking face (Greta). The texture of the impersonated face is mapped onto “Greta” and coded for transmission over the phone, along with the synthesized voice. Audio-visual coding and synthesis is realized by indexing in a memory containing audio-visual sequences. Stochastic models (coupled HMM) of characteristic segments are used to drive the search in memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio-Visual Identity Verification and Robustness to Imposture

The robustness of talking-face identity verification (IV) systems is best evaluated by monitoring their behavior under impostor attacks. We propose a scenario where the impostor uses a still face picture and a sample of speech of the genuine client to transform his/her speech and visual appearance into that of the target client. We propose MixTrans, an original text-independent technique for vo...

متن کامل

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

This paper proposes a novel approach towards a videorealistic, speech-driven talking face for Cantonese. We present a technique that realizes a talking face for a target language (Cantonese) using only audio-visual facial recordings for a base language (English). Given a Cantonese speech input, we first use a Cantonese speech recognizer to generate a Cantonese syllable transcription. Then we ma...

متن کامل

Real-time speech-driven face animation with expressions using neural networks

A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, au...

متن کامل

Real-Time Speech-Driven 3D Face Animation

In this paper, we present an approach for real-time speech-driven 3D face animation using neural networks. We first analyze a 3D facial movement sequence of a talking subject and learn a quantitative representation of the facial deformations, called the 3D Motion Units (MUs). A 3D facial deformation can be approximated by a linear combination of the MUs weighted by the MU parameters (MUPs) – th...

متن کامل

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

Facial expression is one of the most expressive ways for human beings to deliver their emotion, intention, and other nonverbal messages in face to face communications. In this chapter, a layered parametric framework is proposed to synthesize the emotional facial expressions for an MPEG4 compliant talking avatar based on the three dimensional PAD model, including pleasure-displeasure, arousal-no...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

An Audio-Visual Imposture Scenario by Talking Face Animation

نویسندگان

چکیده

منابع مشابه

Audio-Visual Identity Verification and Robustness to Imposture

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion

Real-time speech-driven face animation with expressions using neural networks

Real-Time Speech-Driven 3D Face Animation

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar

عنوان ژورنال:

اشتراک گذاری